All Questions
1 question
1vote
1answer
158views
Why are these two implementations of the $\epsilon$-greedy policy different?
According to the book Reinforcement Learning An Introduction, the epsilon greedy policy can generally implemented as: $$ \pi(a|s) = \begin{cases} \frac{\epsilon}{|A|} + 1 - \epsilon & \text{if } ...